Fail or warn on duplicated key during generation #841
Merged
Conversation
bba499a to
85b2f28
Compare
byroot
commented
Aug 23, 2025
| if (RB_UNLIKELY(state->space)) fbuffer_append_str(buffer, data->state->space); | ||
| generate_json(buffer, data, val); | ||
|
|
||
| arg->iter++; |
Member
Author
There was a problem hiding this comment.
Getting rid of this increment does offset the small overhead:
== Encoding activitypub.json (52595 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
after 3.019k i/100ms
Calculating -------------------------------------
after 30.785k (± 2.5%) i/s (32.48 μs/i) - 153.969k in 5.004680s
Comparison:
before: 30570.1 i/s
after: 30784.6 i/s - same-ish: difference falls within error
== Encoding citm_catalog.json (500298 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
after 159.000 i/100ms
Calculating -------------------------------------
after 1.630k (± 0.6%) i/s (613.58 μs/i) - 8.268k in 5.073274s
Comparison:
before: 1603.6 i/s
after: 1629.8 i/s - 1.02x faster
== Encoding twitter.json (466906 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
after 308.000 i/100ms
Calculating -------------------------------------
after 3.088k (± 0.6%) i/s (323.86 μs/i) - 15.708k in 5.087390s
Comparison:
before: 3041.1 i/s
after: 3087.8 i/s - 1.02x faster
1ec4ecd to
036dadf
Compare
Because both strings and symbols keys are serialized the same,
it always has been possible to generate documents with duplicated
keys:
```ruby
>> puts JSON.generate({ foo: 1, "foo" => 2 })
{"foo":1,"foo":2}
```
This is pretty much always a mistake and can cause various
issues because it's not guaranteed how various JSON parsers
will handle this.
Until now I didn't think it was possible to catch such case without
tanking performance, hence why I only made the parser more strict.
But I finally found a way to check for duplicated keys cheaply enough.
036dadf to
8454149
Compare
This was referenced Aug 24, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Because both strings and symbols keys are serialized the same, it always has been possible to generate documents with duplicated keys:
This is pretty much always a mistake and can cause various issues because it's not guaranteed how various JSON parsers will handle this.
Until now I didn't think it was possible to catch such case without tanking performance, hence why I only made the parser more strict.
But I finally found a way to check for duplicated keys cheaply enough.